23/09/2021

Welcome to Data Handling: I.C.V. 2021!

  • Fire up your notebooks!
  • Go to this page: http://bit.ly/datahandling-2021
  • Use one row to respond to the questions in the column headers (see the first two rows for examples).

Introductory Example

Data input, processing, output

The Data Pipeline

The Data Pipeline

The Data Pipeline

  • Research report/paper (e.g., BA Thesis)
  • Presentation/Slides
  • Website
  • Web application (interactive; alas the introductory example)
  • Dashboard for management
  • Recommender system (i.e., a trained machine learning algorithm)

‘Data Science?’

‘Data Science?’

“This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and inter-disciplinary applications.”

University of Michigan ‘Data Science Initiative,’ 2015

But, what about statistics?!

“Seemingly, statistics is being marginalized here; the implicit message is that statistics is a part of what goes on in data science but not a very big part. At the same time, many of the concrete descriptions of what the DSI will actually do will seem to statisticians to be bread-and-butter statistics. Statistics is apparently the word that dare not speak its name in connection with such an initiative!”

David Donoho (2015). 50 years of Data Science

Background

What’s new about all this?

“All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: …”

What’s new about all this?

“All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”

What’s new about all this?

John Tukey (The Future of Data Analysis, 1962!)

Technological change

Relevance for modern economic research

Relevance for modern economic research

Relevance for modern economic research

Relevance for modern economic research

Economy/Business

Organisation of the Course

Our Team - At Your Service

Aurélien Sallin Michael Tüting Ulrich Matter

Course Structure

Course concept

  • Lectures (Thursday morning)
    • Background/Concepts
    • Live demonstrations of concepts
    • Illustration of ‘hands-on’ approaches

Course components I

  • Lectures (Thursday morning)
    • Background/Concepts
    • Live demonstrations of concepts
    • Illustration of ‘hands-on’ approaches
  • Exercises (handed out every other week)
    • Some conceptual questions (as they appear in the exam)
    • Hands-on exercises/tutorials in R
    • Detailed solution videos
    • First Exercises (set up R/RStudio) is available on StudyNet/Canvas today

Course components II

  • Workshops/Exercises (bi-weekly evening sessions)
    • Discussion of exercises and additional input
    • Recap of theoretical concepts
    • Q&A, support
  • Guest lecture and research insights

Course concept

  • Lectures (every Thursday morning)
    • Background/Concepts
    • Live demonstrations of concepts
    • Illustration of ‘hands-on’ approaches
  • Workshops/Exercises (bi-weekly evening sessions)
    • Guided tutorials
    • Discussion of homework exercises
    • Recap of theoretical concepts
    • First Exercises (set up R/RStudio) is available on StudyNet/Canvas today

Course concept

  • Learning mode in this course: Visit the lecture, recap key concepts in lecture notes (self-study), work on exercises, watch solution video, come to exercise session, repeat…

  • Strongly encouraged: (virtual) learning groups!

    • Biweekly exercises provide opportunity.
    • Tackle the tricky exercises together!

Part I: Data (Science) fundamentals

Date Topic
23.09.2021 Introduction: Big Data/Data Science, course overview
30.09.2021 An introduction to data and data processing
30.09.2021 Exercises/Workshop 1: Tools, working with text files
7.10.2021 Data storage and data structures
14.10.2021 Big Data from the Web
14.10.2021 Exercises/Workshop 2: Computer code and data storage
21.10.2021 Programming with data

Part II: Data gathering and preparation

Date Topic
28.10.2021 Research insights
28.10.2021 Exercises/Workshop 3: Programming with Data
NA Semester Break
NA Semester Break
18.11.2021 Data sources, data gathering, data import
25.11.2021 Data preparation and manipulation
25.11.2021 Exercises/Workshop 4: Data import and data preparation/manipulation

Part III: Analysis, visualisation, output

Date Topic
02.12.2021 Basic statistics and data analysis with R
09.12.2021 Guest Lecture (“Data Science in Insurance”)
09.12.2021 Exercises/Workshop 5: Applied data analysis with R
16.12.2021 Visualisation, dynamic documents
21.12.2021 Exercises/Workshop 6: Visualization, dynamic documents
23.12.2021 Summary, Wrap-Up, Q&A, Feedback
23.12.2021 Exam for Exchange Students

Core course resources

  • All information and materials (notes, slides, course sheet, syllabus, etc.) are available on StudyNet/Canvas.
  • Exercises will be handed out via GitHub Classroom!
  • Solutions to the exercises will be made available on Canvas.
  • This course is open souce: all raw materials (code, source code for slides, notes, etc.) are freely available on GitHub

Main textbooks

Further resources

Exam information

  • Central, written examination.
  • Multiple choice questions.
  • A few open questions.
  • Theoretical concepts and practical applications in R (questions based on code examples).

Exam information II

  • Exercises towards the end of the term will contain sample questions.
    • Get familiar with the style/format of questions.
  • Exchange students who need to take the exam before the central exam block:

Q&A

References